Smart Data Analysis and Statistics

Predicting personalised absolute treatment effects in individual participant data meta-analysis: an introduction to splines

Background: Modelling non-linear associations between an outcome and continuous patient characteristics, whilst investigating heterogeneous treatment effects, is one of the opportunities offered by individual participant data meta-analysis (IPD-MA). Splines offer great flexibility, but guidance is lacking.

Objective: To introduce modelling of nonlinear associations using restricted cubic splines (RCS), natural B-splines, P-splines, and smoothing splines in IPD-MA to estimate absolute treatment effects.

Methods: We describe the pooling of spline-based models using pointwise and multivariate meta-analysis (two-stage methods) and one-stage generalised additive mixed effects models (GAMMs). We illustrate their performance on three IPD-MA scenarios of five studies each: one where only the associations differ across studies, one where only the ranges of the effect modifier differ and one where both differ. We also evaluated the approaches in an empirical example, modelling the risk of fever and/or ear pain in children with acute otitis media conditional on age.

Results: In the first scenario, all pooling methods showed similar results. In the second and third scenario, pointwise meta-analysis was flexible but showed non-smooth results and wide confidence intervals; multivariate meta-analysis failed to converge with RCS, but was efficient with natural B-splines. GAMMs produced smooth pooled regression curves in all settings. In the empirical example, results were similar to the second and third scenario, except for multivariate meta-analysis with RCS, which now converged.

Conclusion: We provide guidance on the use of splines in IPD-MA, to capture heterogeneous treatment effects in presence of non-linear associations, thereby facilitating estimation of absolute treatment effects to enhance personalized healthcare.

Journal: Res Synth Methods |

Year: 2022

Citation: 4

Individual participant data (IPD) from multiple sources allows external validation of a prognostic model across multiple populations. Often this reveals poor calibration, potentially causing poor predictive performance in some populations. However, rather than discarding the model outright, it may be possible to modify the model to improve performance using recalibration techniques. We use IPD meta-analysis to identify the simplest method to achieve good model performance. We examine four options for recalibrating an existing time-to-event model across multiple populations: (i) shifting the baseline hazard by a constant, (ii) re-estimating the shape of the baseline hazard, (iii) adjusting the prognostic index as a whole, and (iv) adjusting individual predictor effects. For each strategy, IPD meta-analysis examines (heterogeneity in) model performance across populations. Additionally, the probability of achieving good performance in a new population can be calculated allowing ranking of recalibration methods. In an applied example, IPD meta-analysis reveals that the existing model had poor calibration in some populations, and large heterogeneity across populations. However, re-estimation of the intercept substantially improved the expected calibration in new populations, and reduced between-population heterogeneity. Comparing recalibration strategies showed that re-estimating both the magnitude and shape of the baseline hazard gave the highest predicted probability of good performance in a new population. In conclusion, IPD meta-analysis allows a prognostic model to be externally validated in multiple settings, and enables recalibration strategies to be compared and ranked to decide on the least aggressive recalibration strategy to achieve acceptable external model performance without discarding existing model information.

Aims: Use of prediction models is widely recommended by clinical guidelines, but usually requires complete information on all predictors, which is not always available in daily practice. We aim to describe two methods for real-time handling of missing predictor values when using prediction models in practice.

Methods and results: We compare the widely used method of mean imputation (M-imp) to a method that personalizes the imputations by taking advantage of the observed patient characteristics. These characteristics may include both prediction model variables and other characteristics (auxiliary variables). The method was implemented using imputation from a joint multivariate normal model of the patient characteristics (joint modelling imputation; JMI). Data from two different cardiovascular cohorts with cardiovascular predictors and outcome were used to evaluate the real-time imputation methods. We quantified the prediction model's overall performance [mean squared error (MSE) of linear predictor], discrimination (c-index), calibration (intercept and slope), and net benefit (decision curve analysis). When compared with mean imputation, JMI substantially improved the MSE (0.10 vs. 0.13), c-index (0.70 vs. 0.68), and calibration (calibration-in-the-large: 0.04 vs. 0.06; calibration slope: 1.01 vs. 0.92), especially when incorporating auxiliary variables. When the imputation method was based on an external cohort, calibration deteriorated, but discrimination remained similar.

Conclusions: We recommend JMI with auxiliary variables for real-time imputation of missing values, and to update imputation models when implementing them in new settings or (sub)populations.

Missing data present challenges for development and real-world application of clinical prediction models. While these challenges have received considerable attention in the development setting, there is only sparse research on the handling of missing data in applied settings. The main unique feature of handling missing data in these settings is that missing data methods have to be performed for a single new individual, precluding direct application of mainstay methods used during model development. Correspondingly, we propose that it is desirable to perform model validation using missing data methods that transfer to practice in single new patients. This article compares existing and new methods to account for missing data for a new individual in the context of prediction. These methods are based on (i) submodels based on observed data only, (ii) marginalization over the missing variables, or (iii) imputation based on fully conditional specification (also known as chained equations). They were compared in an internal validation setting to highlight the use of missing data methods that transfer to practice while validating a model. As a reference, they were compared to the use of multiple imputation by chained equations in a set of test patients, because this has been used in validation studies in the past. The methods were evaluated in a simulation study where performance was measured by means of optimism corrected C-statistic and mean squared prediction error. Furthermore, they were applied in data from a large Dutch cohort of prophylactic implantable cardioverter defibrillator patients.

Innovation

History

Dealing with missing data using the Heckman selection model: methods primer for epidemiologists

Journal: Int. J. Epidemiol. |

Year: 2023

Citation: 1

Predicting personalised absolute treatment effects in individual participant data meta-analysis: an introduction to splines

Journal: Res Synth Methods |

Year: 2022

Citation: 4

Individual participant data meta-analysis for external validation, recalibration and updating of a flexible parametric prognostic model

Journal: Stat Med |

Year: 2021

Citation: 9

Real-time imputation of missing predictor values in clinical practice

Journal: EHJ-DH |

Year: 2020

Citation: 7

Handling missing predictor values when validating and applying a prediction model to new patients

Journal: Stat Med |

Year: 2020

Citation: 25

Multiple imputation for multilevel data with continuous and binary variables

Journal: Stat Sci |

Year: 2018

Citation: 76

Books